Design, synthesis and experimental validation of novel potential chemopreventive agents using random forest and support vector machine binary classifiers
نویسندگان
چکیده
Compared to the current knowledge on cancer chemotherapeutic agents, only limited information is available on the ability of organic compounds, such as drugs and/or natural products, to prevent or delay the onset of cancer. In order to evaluate chemical chemopreventive potentials and design novel chemopreventive agents with low to no toxicity, we developed predictive computational models for chemopreventive agents in this study. First, we curated a database containing over 400 organic compounds with known chemoprevention activities. Based on this database, various random forest and support vector machine binary classifiers were developed. All of the resulting models were validated by cross validation procedures. Then, the validated models were applied to virtually screen a chemical library containing around 23,000 natural products and derivatives. We selected a list of 148 novel chemopreventive compounds based on the consensus prediction of all validated models. We further analyzed the predicted active compounds by their ease of organic synthesis. Finally, 18 compounds were synthesized and experimentally validated for their chemopreventive activity. The experimental validation results paralleled the cross validation results, demonstrating the utility of the developed models. The predictive models developed in this study can be applied to virtually screen other chemical libraries to identify novel lead compounds for the chemoprevention of cancers.
منابع مشابه
Application of ensemble learning techniques to model the atmospheric concentration of SO2
In view of pollution prediction modeling, the study adopts homogenous (random forest, bagging, and additive regression) and heterogeneous (voting) ensemble classifiers to predict the atmospheric concentration of Sulphur dioxide. For model validation, results were compared against widely known single base classifiers such as support vector machine, multilayer perceptron, linear regression and re...
متن کاملSupport Vector Machine Based Facies Classification Using Seismic Attributes in an Oil Field of Iran
Seismic facies analysis (SFA) aims to classify similar seismic traces based on amplitude, phase, frequency, and other seismic attributes. SFA has proven useful in interpreting seismic data, allowing significant information on subsurface geological structures to be extracted. While facies analysis has been widely investigated through unsupervised-classification-based studies, there are few cases...
متن کاملImprovement of Support Vector Machine and Random Forest Algorithm in Predicting Khorramabad River Flow Uusing Non-uniform De-Noising of data and Simplex Algorithm
In this study, in order to simulate the monthly flow of the Khorramabad River, the time series of this river was decomposed into three levels using the wavelet of Daubechies-3, during the period of 1955-2014. Based on this, it was found that there is a Non-uniform noise that includes two periods of time in this signal, with the October 2008 border which required that the signal be become non-un...
متن کاملEffective Classifiers for Detecting Objects
Several state-of-the-art machine learning classifiers are compared for the purposes of object detection in complex images, using global image features derived from the Ohta color space and Local Binary Patterns. Image complexity in this sense refers to the degree to which the target objects are occluded and/or nondominant (i.e. not in the foreground) in the image, and also the degree to which t...
متن کاملPredicting the cause of kidney stones in patients using random forest, support vector machine and neural network
Background: Today, with the advancement of technology in various fields, the importance of recording data in the field of health is increasing so much that for many diseases around the world, including kidney disease, registration systems have been set up. This is happening in our country and in the future, the number of these systems will increase. The medical data set contains valuable inform...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Journal of computer-aided molecular design
دوره 28 6 شماره
صفحات -
تاریخ انتشار 2014